R markdown combines formatted text and code and outputs! Great for reproducibility - less opportunities for mistakes.

First we attach packages (in the code chunk above)

If you need to install package: go to the console, type install.packages(“packagename”)

Read in data (command option i for code chunk) (command enter runs code)

largest header

smallest header
sf_trees <- read_csv(here("data", "sf_trees", "sf_trees.csv"))
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   tree_id = col_double(),
##   legal_status = col_character(),
##   species = col_character(),
##   address = col_character(),
##   site_order = col_double(),
##   site_info = col_character(),
##   caretaker = col_character(),
##   date = col_date(format = ""),
##   dbh = col_double(),
##   plot_size = col_character(),
##   latitude = col_double(),
##   longitude = col_double()
## )

##Basic wrangling reminders

refresh data wrangling skills!

Find top 5 highest observations of trees by legal status, then do some wrangling and make a graph.

(command shift M is pipe operator)

top_5_status <- sf_trees %>% 
  count(legal_status) %>% 
  drop_na(legal_status) %>% 
  rename(tree_count = n) %>% 
  relocate(tree_count) %>% 
  slice_max(tree_count, n= 5)

## count basically combines group by, n, summarize functions. super useful
##drop_na removes any rows that contain a missing or na value for the variable you specify 
# rename - new name goes first, then old name 
# relocate: tree_count moves to the first column 
#slice_max allows you to ID the rows with highest values for variable that you specify, and then only keeps top ## 

Make a graph of top 5 observations by legal status

ggplot(data = top_5_status, aes(x = fct_reorder(legal_status, tree_count), y = tree_count)) +
  geom_col() +
  labs(x = "Legal Status", y = "Tree Count", title = "Test Title") +
  coord_flip() +
  theme_minimal()

A few more data wrangling refresher examples!!

Only want to keep observations (rows) for blackwood acacia trees - no separate column for scientific and common names… but we can look for everything that contains “blackwood acacia” using filter.

blackwood <- sf_trees %>% 
  filter(str_detect(species, "Blackwood Acacia")) %>% 
  select(legal_status, date, latitude, longitude)

ggplot(data = blackwood, aes(x = longitude, y = latitude)) +
  geom_point()
## Warning: Removed 27 rows containing missing values (geom_point).

##string detect (str_detect) looks for a string within a variable that we specify 
# select helps us pick columns 

Use tidyr :: separate and unite functions. useful for combining or separating columns

sf_trees_sep <- sf_trees %>% 
  separate(species, into = c("spp_sci", "spp_common"), sep = "::")

Example of unite… (not sure why we’d do this!)

sf_trees_unite <- sf_trees %>% 
  unite("id_status", tree_id:legal_status, sep = "!!!!!")

Make some actual maps of Blackwood Acacia trees in SF.

We’ll use st_as_sf to convert lat and long values to spatial coordinates

blackwood_spatial <- blackwood %>% 
  drop_na(longitude, latitude) %>% 
  st_as_sf(coords = c("longitude", "latitude"))

st_crs(blackwood_spatial) = 4326

ggplot(data = blackwood_spatial) +
  geom_sf(color = "darkgreen") +
  theme_minimal()

##geom_sf is for plotting spatial data in ggplot! once we've set the coordinate system. 
#but this is still hard to interpret... 

Read in sf roads! to make this map make more sense

sf_map <- read_sf(here("data", "sf_map", "tl_2017_06075_roads.shp"))

##need these in the same coordinate system! there's already an existing crs for this so we'll use st_transform

st_transform(sf_map, 4362)
## Simple feature collection with 4087 features and 4 fields
## geometry type:  LINESTRING
## dimension:      XYZ
## bbox:           xmin: -2714477 ymin: -4267015 xmax: -2699379 ymax: -4255322
## z_range:        zmin: 3879865 zmax: 3890745
## projected CRS:  NAD83(HARN)
## # A tibble: 4,087 x 5
##    LINEARID   FULLNAME     RTTYP MTFCC                                  geometry
##  * <chr>      <chr>        <chr> <chr>                          <LINESTRING [m]>
##  1 110498938… Hwy 101 S O… M     S1400 Z (-2706002 -4263301 3883402, -2705996 -…
##  2 110498937… Hwy 101 N o… M     S1400 Z (-2709096 -4256623 3888533, -2709113 -…
##  3 110366022… Ludlow Aly … M     S1780 Z (-2710490 -4261246 3882534, -2710490 -…
##  4 110608181… Mission Bay… M     S1400 Z (-2704477 -4262466 3885368, -2704348 -…
##  5 110366689… 25th Ave N   M     S1400 Z (-2710575 -4257083 3887009, -2710560 -…
##  6 110368970… Willard N    M     S1400 Z (-2708855 -4259093 3886013, -2708854 -…
##  7 110368970… 25th Ave N   M     S1400 Z (-2710575 -4257083 3887009, -2710581 -…
##  8 110498933… Avenue N     M     S1400 Z (-2700453 -4261099 3889634, -2700376 -…
##  9 110368970… 25th Ave N   M     S1400 Z (-2710538 -4257082 3887035, -2710575 -…
## 10 110367749… Mission Bay… M     S1400 Z (-2703872 -4262847 3885371, -2703973 -…
## # … with 4,077 more rows
ggplot(data = sf_map) +
  geom_sf()

Now combine tree observations and roads map!

ggplot() +
  geom_sf(data = sf_map, size = 0.1, color = "darkgray") +
  geom_sf(data = blackwood_spatial, size = 0.4, color = "darkgreen") + 
  theme_void() +
  labs(title = "Blackwood Acacias in San Francisco")

Let’s make this interactive!!

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(blackwood_spatial) +
  tm_dots()